21 research outputs found
SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
Remote sensing images are useful for a wide variety of planet monitoring
applications, from tracking deforestation to tackling illegal fishing. The
Earth is extremely diverse -- the amount of potential tasks in remote sensing
images is massive, and the sizes of features range from several kilometers to
just tens of centimeters. However, creating generalizable computer vision
methods is a challenge in part due to the lack of a large-scale dataset that
captures these diverse features for many tasks. In this paper, we present
SatlasPretrain, a remote sensing dataset that is large in both breadth and
scale, combining Sentinel-2 and NAIP images with 302M labels under 137
categories and seven label types. We evaluate eight baselines and a proposed
method on SatlasPretrain, and find that there is substantial room for
improvement in addressing research challenges specific to remote sensing,
including processing image time series that consist of images from very
different types of sensors, and taking advantage of long-range spatial context.
Moreover, we find that pre-training on SatlasPretrain substantially improves
performance on downstream tasks, increasing average accuracy by 18% over
ImageNet and 6% over the next best baseline. The dataset, pre-trained model
weights, and code are available at https://satlas-pretrain.allen.ai/.Comment: ICCV 202
Machine-Assisted Map Editing
Mapping road networks today is labor-intensive. As a result, road maps have
poor coverage outside urban centers in many countries. Systems to automatically
infer road network graphs from aerial imagery and GPS trajectories have been
proposed to improve coverage of road maps. However, because of high error
rates, these systems have not been adopted by mapping communities. We propose
machine-assisted map editing, where automatic map inference is integrated into
existing, human-centric map editing workflows. To realize this, we build
Machine-Assisted iD (MAiD), where we extend the web-based OpenStreetMap editor,
iD, with machine-assistance functionality. We complement MAiD with a novel
approach for inferring road topology from aerial imagery that combines the
speed of prior segmentation approaches with the accuracy of prior iterative
graph construction methods. We design MAiD to tackle the addition of major,
arterial roads in regions where existing maps have poor coverage, and the
incremental improvement of coverage in regions where major roads are already
mapped. We conduct two user studies and find that, when participants are given
a fixed time to map roads, they are able to add as much as 3.5x more roads with
MAiD
RoadTagger: Robust Road Attribute Inference with Graph Neural Networks
Inferring road attributes such as lane count and road type from satellite
imagery is challenging. Often, due to the occlusion in satellite imagery and
the spatial correlation of road attributes, a road attribute at one position on
a road may only be apparent when considering far-away segments of the road.
Thus, to robustly infer road attributes, the model must integrate scattered
information and capture the spatial correlation of features along roads.
Existing solutions that rely on image classifiers fail to capture this
correlation, resulting in poor accuracy. We find this failure is caused by a
fundamental limitation -- the limited effective receptive field of image
classifiers. To overcome this limitation, we propose RoadTagger, an end-to-end
architecture which combines both Convolutional Neural Networks (CNNs) and Graph
Neural Networks (GNNs) to infer road attributes. The usage of graph neural
networks allows information propagation on the road network graph and
eliminates the receptive field limitation of image classifiers. We evaluate
RoadTagger on both a large real-world dataset covering 688 km^2 area in 20 U.S.
cities and a synthesized micro-dataset. In the evaluation, RoadTagger improves
inference accuracy over the CNN image classifier based approaches. RoadTagger
also demonstrates strong robustness against different disruptions in the
satellite imagery and the ability to learn complicated inductive rules for
aggregating scattered information along the road network
Robust road topology extraction from aerial imagery
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 63-65).Creating and updating road maps is currently an expensive and often manual process, and thus maps today are outdated or have poor coverage in large regions of the world. Automatically inferring the road network graph from aerial imagery provides a promising avenue to reducing the cost of maintaining road maps, but existing inference methods have poor precision. This thesis develops a novel iterative graph construction process for extracting graph structures from images, and applies this process to automatic road topology inference to significantly reduce error rates.by Favyen Bastani.S.M
Label-Efficient and Compute-Efficient Video Analytics
The ability to analyze large-scale video datasets is useful in an increasing range of applications. For example, a traffic planner may want to analyze traffic camera video to compare the frequency of hard braking at different junctions, while an ecology researcher may be interested in identifying instances of various behaviors between pairs of birds in video of a bird feeder. However, implementing machine learning (ML) pipelines for video analytics tasks remains challenging for two reasons. First, these tasks generally require applying expensive ML models to robustly detect and track objects such as cars and birds. These models are both label-intensive, often requiring thousands of labeled examples to achieve high-accuracy, and compute-intensive, executing at tens of frames per second even on datacenter GPUs. Second, in addition to applying ML models, these tasks often require several auxiliary operations to pre-process the input video and associated metadata, and to post-process model outputs to extract useful insights. For example, counting hard braking incidents necessitates post-processing object tracks of cars to identify sharp decelerations.
In this thesis, we present SkyhookML, a platform for analytics tasks over large-scale video datasets. To reduce the cost of video analytics, we integrate approximate video query processing optimizations, efficient video pre-processing methods, and self-supervised learning techniques into SkyhookML. Approximate processing optimizations sacrifice a small amount of accuracy for large gains in throughput by avoiding applying the most accurate but also most expensive models on every video frame. Efficient pre-processing methods extract general-purpose insights from video that can be reused across several analytics tasks. Self-supervised learning techniques can substantially reduce the labeling effort needed to train robust models by deriving learning signals from unlabeled data. By employing novel approaches in each of these three categories that are specialized for analyzing object detections and tracks that appear in video data, SkyhookML addresses the label- and compute-intensiveness of video analytics and enables users to efficiently develop and deploy ML pipelines.Ph.D